Picture for Xinggang Wang

Xinggang Wang

PixelHacker: Image Inpainting with Structural and Semantic Consistency

Add code
Apr 30, 2025
Viaarxiv icon

STP4D: Spatio-Temporal-Prompt Consistent Modeling for Text-to-4D Gaussian Splatting

Add code
Apr 25, 2025
Viaarxiv icon

MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling

Add code
Mar 17, 2025
Viaarxiv icon

Towards Fast, Memory-based and Data-Efficient Vision-Language Policy

Add code
Mar 13, 2025
Viaarxiv icon

GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding

Add code
Mar 13, 2025
Viaarxiv icon

OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models

Add code
Mar 11, 2025
Viaarxiv icon

AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

Add code
Mar 10, 2025
Viaarxiv icon

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

Add code
Feb 18, 2025
Viaarxiv icon

Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation

Add code
Feb 18, 2025
Viaarxiv icon

SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild

Add code
Jan 07, 2025
Figure 1 for SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
Figure 2 for SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
Figure 3 for SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
Figure 4 for SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
Viaarxiv icon